Illumina metabarcoding of a soil fungal community
نویسندگان
چکیده
Next generation metabarcoding is becoming an indispensable tool in fungal community ecology. Here we tested Illumina metabarcoding, a method that generates shorter reads but achieves deeper sequencing than 454 metabarcoding approaches. We found that paired-end Illumina MiSeq data cover the full ITS1 in many fungal lineages and are suitable for environmental fungal community assessment. There was substantial read loss during data cleanup (78.6%), which, however, did not impede the analyses, because of the large number of initial sequences (over 4Mio). We observed a high stochasticity in individual PCR reactions. Comparing three repeated sets of PCRs products showed that 58.5% of the total fungal operational taxonomic units (OTUs) found were not recovered by any single set of PCR reactions. Similarly, comparing three annealing temperatures showed that 63.6% of all fungal OTUs were not recovered using any single annealing temperature. These findings suggest that sampling of soil fungal communities is more exhaustive, if we combine repeated PCR products, and PCR products generated at various annealing temperatures. To analyze the above issues we sampled 16 soil cores along a 270 cm transect in a meadow. In total we recovered 3320 fungal OTUs (based on a 95% similarity threshold). Distance decay analysis indicated that community similarity decreased slightly with geographical distance. 2013 Elsevier Ltd. All rights reserved. High throughput metabarcoding has been recognized as a powerful tool to study fungal communities. To date most studies using next generation sequencing of soil fungi are based on 454 pyrosequencing (e.g. Öpik et al., 2009; Buée et al., 2009; Dumbrell et al., 2011). Modern 454 reads cover >400 base pairs of fungal barcode markers such as ITS rDNA or partial 18S rDNA. However, most fungal 454 metabarcoding studies use either ITS1 or ITS2 (Buée et al., 2009; Jumpponen and Jones, 2009; Jumpponen et al., 2010; Cordier et al., 2012; Danielsen et al., 2012; Xu et al., 2012; Zimmerman and Vitousek, 2012). Even if the full ITS is amplified, the most common 454 chemistry recovers either ITS1 or ITS2 in full length, while the sequenced fraction of the other ITS subfragment is too short for inferences (Bálint et al., 2013; Bazzicalupo et al., 2013). Fragments in the size range of ITS1 or ITS2 can be readily sequenced esearch Centre, Senckenberg 25, 60325 Frankfurt/Main, 4. : þ49 6975427904. (M. Bálint), ischmitt@ All rights reserved. on the Illumina MiSeq platform. The Illumina platform provides sequencing at greater depth for a considerably lower price compared to 454, and this promises a deeper characterization of fungal communities. While the potential of high throughput metabarcoding is undisputed for studying complex fungal communities, our ability to understand the ecology of these communities has been hampered by insufficient sequencing depth and the high cost of 454 sequencing. For example, OTU rarefaction curves were often not saturated (e.g. Jumpponen et al., 2010), or processing of large numbers of replicates was not feasible due to the unfavorable throughput/price ratio (Bálint et al., 2013). To improve our understanding of the community composition and distribution of complex fungal communities we need to achieve deeper sequencing, and analyze larger numbers of samples per study (Caporaso et al., 2012). Increasing the number of replicates allows a more thorough evaluation of methodological biases inherent in metabarcoding, e.g. stochasticity of individual PCR reactions, and improves the statistical power of downstream data analyses. The aim of this study was therefore to test a method that generates a greater number of fungal metabarcodes at a lower cost. We specifically addressed the following questions: 1) What proportion of P.-A. Schmidt et al. / Soil Biology & Biochemistry 65 (2013) 128e132 129 Illumina raw data is lost due to methodological issues and primer non-specificity 2) Do replicate PCRs produce similar diversity estimates in metabarcoding studies? 3) Do different annealing temperatures affect the number and identity of OTUs recovered in Illumina metabarcoding? We performed the evaluations on a simple system: soil fungal communities along a short (270 cm) transect in a meadow. We show that Illumina metabarcoding is feasible for analyzing spatial community structure. The high sequencing depth may help to better understand the fine-scale complexity in soil fungal communities. We collected 16 soil samples along a 270 cm transect from a low input meadow located at Flörsheim, Germany (N50 00 26.48200 E8 230 58.50200, see Supplementary Fig. S1 for transect design). The soil is alluvial silty clay, with a pH (CaCl2) of 6.9, and an organic matter content of 2.9% (see also Supplementary Table T1). The site has not been treated with fertilizers or pesticides for at least 10 years. Surface soil cores were 1.6 cm in diameter and 10 cm deep. We removed the vegetation cover before sampling, homogenized the cores, and kept samples at 20 C until DNA extraction. After drying at 45 C we used 300 mg from each core for DNA extraction with the FastDNA SPIN Kit for Soil (MP Biomedicals, USA). We amplified the ITS1 region using the newly developed primer ITS1FI2, 50-GAACCWGCGGARGGATCA-30 and ITS2 (White et al., 1990a). ITS1FI2 overlaps in six positions with ITS1F (White et al., 1990b), but is located closer to the end of the 18S. We used combinatorial primer labeling to identify samples aftermultiplexed sequencing (Gloor et al., 2010). Amplifications were carried out in a total volume of 20 ml using 50 ng of DNA, 4 ml of HOT MOLPol Blend Master Mix (Molegene, Germany), and 0.5 mM of each of forward and reverse primers. PCR conditions were 15 min at 95 C, followed by 35 cycles of 30 s at 95 C, 30 s at either 52 C, 55 C or 58 C, and 30 s at 72 C. The PCR with 52 C annealing temperature was repeated three times. Final elongation was done at 72 C for 5 min. Amplicons from the five parallel PCR runs (3 52 C, 1 55 C, 1 58 C) were individually labeled to estimate the effect of repeated PCRs and annealing temperatures on richness recovery. Purification was done with Agencourt AMPure XP SPRI magnetic beads. PCR products were normalized and pooled. We normalized PCR products after quantifying them with a Qubit 2.0 Fluorometer (Invitrogen), and the Qubit dsDNA HS Assay Kit (Invitrogen). Paired-end sequencing (2 150 bp) was carried out on an Illumina MiSeq sequencer at the Biomedical Genomics Center of the University of Minnesota, U.S.A. We assembled paired-end reads using PandaSeq (Masella et al., 2012), and filtered out all sequences containing “N”s. If there were mismatches between the overlapping fragments of the forward and reverse reads, these were corrected according to the base call with the higher sequencer-assigned quality score. The quality of the reads was checked with FastQC (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/, accessed on 1 August 2012). After demultiplexing the reads using fqgrep (https://github.com/ indraniel/fqgrep, accessed on 1 August 2012) we kept only those sequences starting with a perfectly matching labeled primer sequence (omitting the first base pair, N-1). Initial denoising was performed with a 97% similarity clustering with the heuristic clustering algorithm UCLUST 2.1, implemented in USEARCH v.6.0.203 (Edgar, 2010). The longest sequences served as seeds for the preclustering. De novo chimera detection was performed with the UCHIME algorithm (Edgar et al., 2011). OTU picking was performed at 95% sequence similarity with a modified version of the OTUpipe 1.1.9 wrapper for UCLUST (http://drive5.com/otupipe/, accessed on 15 August 2012). The most abundant sequence types served as clustering seeds. Most sister species differ by less than 2e 3% in fungi (Schoch et al., 2012). However, we opted for a more conserved clustering threshold to account for intragenomic ITS variability, which is present inmany fungal lineages (e.g. Simon and Weiss, 2008; Kovács et al., 2011), and which may exceed the similarity thresholds commonly employed for OTU delimitation (Lindner and Banik, 2011). We excluded OTUs with less than 10 reads following a recommendation in the manual of the program OTUpipe wrapper (http://drive5.com/otupipe/otupipe_manual1.1. pdf). The centroid sequence of each cluster (a representative sequence from the most common sequence type in each OTU) was used for the annotation of fungal reads. Centroid sequences are provided as FASTA files in Supplementary Material 2. We used only fungal clusters for downstream analyses. We blasted (Altschul et al., 1997) the OTU-representative sequences against the entire GenBank nucleotide database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt*, downloaded on 18 October 2012). We parsed the BLAST outputs in MEGAN 4 (Huson et al., 2011) for taxonomic assignment (min. support: 1, min. score: 200, top percent: 5), and we retained clusters with supported fungal origin. The OTU-representative sequences and the abundance of OTUs in each PCR reaction are provided as Supplementary Material 3. We estimated sample richness by computing rarefaction curves for samples amplified in five parallel PCR series (3 52 C, 1 55 C, 1 58 C) and for all samples combined. Rarefaction curves were calculated in MOTHUR V.1.22.2 (Schloss et al., 2009). We combined the reads obtained from the parallel PCRs, and randomly sampled 20,000 reads for each of the 16 samples to account for differences in sequencing depth. We analyzed the spatial structure of fungal assemblages from the 16 soil cores in R v.2.15.2 (R Development Core Team, 2012). We calculated a BrayeCurtis community distance matrix using the vegdist command from the vegan v2.0-3 package (Oksanen et al., 2012). We correlated the community distance matrix with the Euclidean distances of the 16 cores with a Mantel test (9999 permutations). To test whether community similarities significantly decrease with increasing spatial distance we fitted a linear regression on the logarithmized Sørensen similarities of the fungal communities against their spatial distance (Nekola and White, 1999). Given the small sample size, we estimated the standard errors in the intercept and slope by jackknifing the linearmodel with a scriptmodified fromMillar et al. (2011). Sequence data was deposited in the European Nucleotide Archive (ENA) as PRJEB3999. We received a total of 4,280,264 raw reads from the sequencer. Read number decreased to 3,790,739 after paired-end assembly, to 2,640,085 after demultiplexing, and to 2,528,650 (using a 97% preclustering threshold) after chimera detection. Clustering at 95% sequence similarity successfully clustered 2,304,935 reads into OTUs. Taxonomic assignments of OTUs picked at 95% sequence similarities rendered 917,269 reads of fungal origin. Of the nonfungal sequences, 46.1% (627,160) were of plant origin, 48.4% (658,198) were not assignable, and 5.4% (73,664) belonged to other eukaryotes and bacteria. In each of the 16 soil samples we found 60,905 to 237,052 total reads, 20,658 to 97,377 fungal reads, and 1550 to 2963 fungal OTUs (Supplementary Table 2). Overall, we recorded 3320 fungal OTUs from the 16 soil cores combined. These counts exclude OTUs recovered with <10 reads. The 3320 OTUs include Ascomycota (78.14%), Basidiomycota (10.24%), Glomeromycota (3.61%) Chytridiomycota (0.36%), Blastocladiomycota (0.03%), and other fungal lineages (7.62%). The three repeated sets of PCR amplifications using 52 C annealing temperature yielded comparable numbers of sequencing reads (240,511, 182,624, 143,211) and fungal OTUs (Fig. 1). The identity of the OTUs deviated greatly between the three sets of PCR reactions. Only 1219 (41.5%) of the total 2937 OTUs in this PCR series were recovered in each of the three sets of PCR reactions (Fig. 2A). Fig. 1. Rarefaction curves of community richness estimates at different PCR annealing temperatures. Thick continuous line: OTUs from all PCR amplifications combined; thin continuous lines: OTUs from three sets of repeated PCR amplifications at 52 C annealing temperature; dashed line: OTUs from PCR amplification at 55 C annealing temperature; dotted line: OTUs from PCR amplification at 58 C annealing temperature. P.-A. Schmidt et al. / Soil Biology & Biochemistry 65 (2013) 128e132 130 Increasing the annealing temperature slightly lowered the number of fungal OTUs recovered (Fig. 1). However, many of the OTUs found at 55 C and 58 C were not recovered at 52 C. Only 1208 OTUs (36.4%) of the 3320 total OTUs recovered in this study were found at each of the 52 C, 55 C and 58 C annealing temperatures (Fig. 2B). The Mantel test showed a weak, but significant correlation between the spatial distances of the soil samples and the community distances (r 1⁄4 0.232, p 1⁄4 0.015). The linear regression of the logarithmized Sørensen community similarities against the spatial distance of the cores visualized the similarity decay with distance (intercept a 1⁄4 0.92, standard error s.e. 0.049, slope Fig. 2. Effects of repeated sets of PCRs and annealing temperature on fungal OTU recovery. Ve PCR reactions at 52 C, and (B) with PCR reactions at different annealing temperatures. Th b 1⁄4 0.00045, s.e. 0.00017, Fig. 3). The value of the intercept is expressed in ln(similarity). The value of the slope is expressed in ln(similarity) decrease per 1 cm. The jackknifed linear models predict that community similarity decreases by 50% at 1562 cm (s.e. 590 cm). The Illumina MiSeq run provided a large number of sequences covering the entire ITS1. Although 454 generally produces longer reads, most 454 ITS studies use either ITS1 or ITS2 (examples relying on ITS1: Buée et al., 2009; Jumpponen and Jones, 2009; Jumpponen et al., 2010; Cordier et al., 2012; Danielsen et al., 2012; Xu et al., 2012; Yu et al., 2012; Zimmerman and Vitousek, 2012; Bálint et al., 2013). This makes Illumina MiSeq metabarcoding a viable alternative to 454. Furthermore, a recent comparison of 454 versus Illumina metabarcoding suggests that both sequencing technologies recover similar microbial communities (Luo et al., 2012). However, Illumina sequencing recovers considerably more sequences at a low price. This will provide additional flexibility in metabarcoding studies using Illumina MiSeq instead of 454. Illumina metabarcoding datasets can readily be analyzed using desktop computers. All analyses in this study were run on a 64-bit computer (4 processing cores, 3.1 Ghz, 16 GB ram). The computational time demand was w20 min for paired-end assembly (a straightforward procedure relying on an existing program, Masella et al., 2012), w1.5 h for demultiplexing, w2 h for preclustering and chimera detection,w3 h for OTUpicking, and 5e10 h for BLAST searches of OTU-representative sequences against the GenBank database. We observed considerable data loss during data cleanup. This was mostly due to the paired-end assembly and the generality of the primers. Numerous reads (w11.4%) were discarded during the paired-end assembly. These reads had either no compatible overlaps, or contained ambiguity characters (“N”s). We also recorded a high proportion of non-fungal reads (w60.2%). There is an important tradeoff between the specificity of metabarcoding primers and the recovery of multiple fungal lineages (Toju et al., 2012). We opted for relatively unspecific primers to recover more fungal diversity. About 4% of the sequences were identified as chimeric. A recent study suggests that the taxonomic complexity of environmental DNA samples increases the prevalence of chimeric sequences (Fonseca et al., 2012). Dealing with read losses is another instance when the throughput of the Illumina MiSeq becomes very useful: we can afford to discard reads, and still characterize the target organisms in sufficient depth. Methodology-related read nn diagrams show overlapping and non-overlapping OTUs recovered (A) with repeated e areas are not fully proportional to the numbers of OTUs. Fig. 3. Distance decay in the compositional similarity (measured as ln of Sørensen index) of soil fungal communities. Best-fit logarithmic relationships are shown (loglinear on the linear plot). A Mantel test was used to test the strength and significance of correlations (R 1⁄4 0.232, p 1⁄4 0.015, Results). P.-A. Schmidt et al. / Soil Biology & Biochemistry 65 (2013) 128e132 131 losses nonetheless must be considered when planning metabarcoding projects. Recovery of OTU richness varied in repeated PCR reactions and at different annealing temperatures. Three parallel PCR series at 52 C produced only 42% overlapping OTUs (Fig. 2). Therefore we recommend pooling multiple repeated PCRs to offset the stochasticity of individual PCR reactions, and ensure a more exhaustive sampling of community diversity. Using multiple annealing temperatures also facilitates the recovery of OTU richness. Previous studies suggest that low annealing temperatures should be used to maximize taxon recovery, because low annealing temperatures may help dealing with mismatch problems in the primer sequence (Ishii and Fukui, 2001; Acinas et al., 2005; Sipos et al., 2007). We confirm that PCRs at lower annealing temperatures recover slightly more OTUs (Fig. 1). However, OTUs recovered at higher annealing temperatures are not a strict subset of OTUs recovered at lower annealing temperatures: 23.3% of the OTUs were only found in the 52 C annealing temperature PCR series, 6.3% were only found in the 55 C series, and 3.5% only in the 58 C series. Higher annealing temperatures likely result in enhanced primer binding if secondary structures are present in the binding sites (Fonseca et al., 2012). High annealing temperatures may also amplify rare ITS variants, which have perfectly matching primer annealing sites (instead of amplifying abundant, but imperfectly matching variants). Using multiple annealing temperatures may thus decrease primer binding bias, and allow reconstructing more complete fungal communities. The stochasticity in OTU recovery does not necessarily interfere with ecological signals in the data, but may reduce the chances to detect these signals. To obtain the most complete estimate of environmental fungal diversity (using a single set of primers) it is thus advisable to combine repeated PCR products, and PCR products obtained at different annealing temperatures. Deep sequencing shows the distance-dependence of soil-borne fungal communities. Earlier studies reported the stochastic local assembly of root-associated fungal communities employing Sanger sequencing (Tedersoo et al., 2003) or 454 metabarcoding (Blaalid et al., 2012; Danielsen et al., 2012; Lekberg et al., 2012). Our study did not focus exclusively on root-associated fungi, but on soil fungi in general. Our results show slight, but significant distance decay in the similarity of soil-borne fungal communities (Fig. 3). This suggests the presence of spatial community structure over distances of only a few meters. Overall, we regard Illumina metabarcoding of complex fungal communities as a fully feasible and highly promising approach. Given the considerably lower costs per base pair and higher sequencing depth compared to 454, Illumina metabarcoding allows more flexibility during experimental setup. The throughput of the platform facilitates sample multiplexing, more detailed fungal community characterization, and a more thorough recovery of fungal richness. These methodological improvements may lead to the discovery of patterns in the structure of soil fungal communities that go unnoticed using other approaches. Author contributions PS, MB, JR, and IS conceived the study; PS and CB took the samples; PS conducted the laboratory work; PS and MB analyzed the data; PS produced the figures, tables and supporting materials; PS, MB and IS interpreted the results and wrote the manuscript; BG wrote python scripts; all authors edited and approved the manuscript.
منابع مشابه
An Illumina metabarcoding pipeline for fungi
High-throughput metabarcoding studies on fungi and other eukaryotic microorganisms are rapidly becoming more frequent and more complex, requiring researchers to handle ever increasing amounts of raw sequence data. Here, we provide a flexible pipeline for pruning and analyzing fungal barcode (ITS rDNA) data generated as paired-end reads on Illumina MiSeq sequencers. The pipeline presented includ...
متن کاملCharacterization of Bacterial and Fungal Community Dynamics by High-Throughput Sequencing (HTS) Metabarcoding during Flax Dew-Retting
Flax dew-retting is a key step in the industrial extraction of fibers from flax stems and is dependent upon the production of a battery of hydrolytic enzymes produced by micro-organisms during this process. To explore the diversity and dynamics of bacterial and fungal communities involved in this process we applied a high-throughput sequencing (HTS) DNA metabarcoding approach (16S rRNA/ITS regi...
متن کاملComparison and Validation of Some ITS Primer Pairs Useful for Fungal Metabarcoding Studies
Current metabarcoding studies aiming to characterize microbial communities generally rely on the amplification and sequencing of relatively short DNA regions. For fungi, the internal transcribed spacer (ITS) region in the ribosomal RNA (rRNA) operon has been accepted as the formal fungal barcode. Despite an increasing number of fungal metabarcoding studies, the amplification efficiency of prime...
متن کاملSoil fungal community development in a high Arctic glacier foreland follows a directional replacement model, with a mid-successional diversity maximum
Directional replacement and directional non-replacement models are two alternative paradigms for community development in primary successional environments. The first model emphasizes turnover in species between early and late successional niches. The second emphasizes accumulation of additional diversity over time. To test whether the development of soil fungal communities in the foreland of a...
متن کاملFungal Community Structure and As-Resistant Fungi in a Decommissioned Gold Mine Site
Although large quantities of heavy metal laden wastes are released in an uncontrolled manner by gold mining activities with ensuing contamination of the surrounding areas, there is scant information on the mycobiota of gold-mine sites. Thus, the present study was aimed to describe the fungal community structure in three differently As- and Hg-polluted soils collected from the Pestarena decommis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013